Linear Time Algorithm for Projective Clustering

نویسندگان

  • Hu Ding
  • Jinhui Xu
چکیده

Projective clustering is a problem with both theoretical and practical importance and has received a great deal of attentions in recent years. Given a set of points P in R space, projective clustering is to find a set F of k lower dimensional j-flats so that the average distance (or squared distance) from points in P to their closest flats is minimized. Existing approaches for this problem are mainly based on adaptive/volume sampling or core-sets techniques which suffer from several limitations. In this paper, we present the first uniform random sampling based approach for this challenging problem and achieve linear time solutions for three cases, general projective clustering, regular projective clustering, and Lτ sense projective clustering. For the general projective clustering problem, we show that for any given small numbers 0 < γ, < 1, our approach first removes γ|P | points as outliers and then determines k j-flats to cluster the remaining points into k clusters with an objective value no more than (1+ ) times of the optimal for all points. For regular projective clustering, we demonstrate that when the input points satisfy some reasonable assumption on its input, our approach for the general case can be extended to yield a PTAS for all points. For Lτ sense projective clustering, we show that our techniques for both the general and regular cases can be naturally extended to the Lτ sense projective clustering problem for any 1 ≤ τ < ∞. Our results are based on several novel techniques, such as slab partition, ∆-rotation, symmetric sampling, and recursive projection, and can be easily implemented for applications. ar X iv :1 20 4. 67 17 v2 [ cs .C G ] 1 2 Se p 20 12

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

A near-linear algorithm for projective clustering integer points

We consider the problem of projective clustering in Euclidean spaces of non-fixed dimension. Here, we are given a set P of n points in R and integers j ≥ 1, k ≥ 0, and the goal is to find j k-subspaces so that the sum of the distances of each point in P to the nearest subspace is minimized. Observe that this is a shape fitting problem where we wish to find the best fit in the L1 sense. Here we ...

متن کامل

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...

متن کامل

OPTIMIZATION OF FUZZY CLUSTERING CRITERIA BY A HYBRID PSO AND FUZZY C-MEANS CLUSTERING ALGORITHM

This paper presents an efficient hybrid method, namely fuzzy particleswarm optimization (FPSO) and fuzzy c-means (FCM) algorithms, to solve the fuzzyclustering problem, especially for large sizes. When the problem becomes large, theFCM algorithm may result in uneven distribution of data, making it difficult to findan optimal solution in reasonable amount of time. The PSO algorithm does find ago...

متن کامل

Tabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach

  The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012